Our inspiration was from Kaggle competition - Instacart Market Basket Analysis which is also the data sets’ resource. Instacart is a grocery ordering and delivery application. They provide an anonymized dataset contains a sample of over 3 million grocery orders from more than 200,000 Instacart’s users, and for each user, they provide between 4 and 100 of their orders, with the sequence of products purchased in each order, the week and hour of day the order was placed, and a relative measure of time between orders (details of each data set will be introduced below).
Instacart hopes campaign participants test models for predicting products that a user will buy again, try for the first time or add to cart next during a session, which may need to use the the models such as XGBoost, word2vec and Annoy.
Repurchase predicting and order placement day predicting are the popular and helpful predictions among e-commerce companies. For example, Amazon has already developed a patent called “anticipatory shipping” that can predict what and when people want to buy and ship packages even before customers have placed an order. In this case, they can largely optimizing logistics management, human and equipment resources and inventory arrangement, so that it would help them to decrease cost and increase profit. Meantime, this type of prediction also requires much more information of customers’ behavior, such as items customers have searched for, the amount of time a user’s cursor hovers over a product, times of clicks by users, purchase conversion rate of users’ click, add to cart, collection and so on.
In this case, since there are limitation of information and we would like to apply what models we have learnt in the course, we prefer to predict the day of the week that the order will be placed. Then, this would be an additional predictor to support the demand forecasting which is useful to make a right direction in the decision-making process, like inventory arrangement, for the e-commerce platform.
Overall, we produce a new dataset based on what we have downloaded from the competition website, and assume that:
Thus, our research questions will be:
What day of the week that a given order will be placed?
For this question, we will use supervised methods - Classification Tree
and Multiple Logistic Regression.
Are there any common components between departments or
aisles?
For this question, we will use unsupervised methods - PCA and
Clustering.
orders (3.4m rows, 206k users):
* order_id: order identifier
* user_id: customer identifier
* eval_set: which evaluation set this order belongs in (see
SET described below)
* order_number: the order sequence number for this user (1
= first, n = nth)
* order_dow: the day of the week the order was placed
on
* order_hour_of_day: the hour of the day the order was
placed on
* days_since_prior: days since the last order, capped at 30
(with NAs for order_number = 1)
products (50k rows):
* product_id: product identifier
* product_name: name of the product
* aisle_id: foreign key
* department_id: foreign key
aisles (134 rows):
* aisle_id: aisle identifier
* aisle: the name of the aisle
departments (21 rows):
* department_id: department identifier
* department: the name of the department
order_products__SET (30m+ rows):
* order_id: foreign key
* product_id: foreign key
* add_to_cart_order: order in which each product was added
to cart
* reordered: 1 if this product has been ordered by this
user in the past, 0 otherwise
where SET is one of the four following
evaluation sets (eval_set in orders):
* "prior": orders prior to that users most recent order
(~3.2m orders)
* "train": training data supplied to participants (~131k
orders)
* "test": test data reserved for machine learning
competitions (~75k orders)
#> [1] 0
| aisle_id | aisle |
|---|---|
| 1 | prepared soups salads |
| 2 | specialty cheeses |
| 3 | energy granola bars |
| 4 | instant foods |
| 5 | marinades meat preparation |
| 6 | other |
| 7 | packaged meat |
| 8 | bakery desserts |
| 9 | pasta sauce |
| 10 | kitchen supplies |
| 11 | cold flu allergy |
| 12 | fresh pasta |
| 13 | prepared meals |
| 14 | tofu meat alternatives |
| 15 | packaged seafood |
| 16 | fresh herbs |
| 17 | baking ingredients |
| 18 | bulk dried fruits vegetables |
| 19 | oils vinegars |
| 20 | oral hygiene |
| 21 | packaged cheese |
| 22 | hair care |
| 23 | popcorn jerky |
| 24 | fresh fruits |
| 25 | soap |
| 26 | coffee |
| 27 | beers coolers |
| 28 | red wines |
| 29 | honeys syrups nectars |
| 30 | latino foods |
| 31 | refrigerated |
| 32 | packaged produce |
| 33 | kosher foods |
| 34 | frozen meat seafood |
| 35 | poultry counter |
| 36 | butter |
| 37 | ice cream ice |
| 38 | frozen meals |
| 39 | seafood counter |
| 40 | dog food care |
| 41 | cat food care |
| 42 | frozen vegan vegetarian |
| 43 | buns rolls |
| 44 | eye ear care |
| 45 | candy chocolate |
| 46 | mint gum |
| 47 | vitamins supplements |
| 48 | breakfast bars pastries |
| 49 | packaged poultry |
| 50 | fruit vegetable snacks |
| 51 | preserved dips spreads |
| 52 | frozen breakfast |
| 53 | cream |
| 54 | paper goods |
| 55 | shave needs |
| 56 | diapers wipes |
| 57 | granola |
| 58 | frozen breads doughs |
| 59 | canned meals beans |
| 60 | trash bags liners |
| 61 | cookies cakes |
| 62 | white wines |
| 63 | grains rice dried goods |
| 64 | energy sports drinks |
| 65 | protein meal replacements |
| 66 | asian foods |
| 67 | fresh dips tapenades |
| 68 | bulk grains rice dried goods |
| 69 | soup broth bouillon |
| 70 | digestion |
| 71 | refrigerated pudding desserts |
| 72 | condiments |
| 73 | facial care |
| 74 | dish detergents |
| 75 | laundry |
| 76 | indian foods |
| 77 | soft drinks |
| 78 | crackers |
| 79 | frozen pizza |
| 80 | deodorants |
| 81 | canned jarred vegetables |
| 82 | baby accessories |
| 83 | fresh vegetables |
| 84 | milk |
| 85 | food storage |
| 86 | eggs |
| 87 | more household |
| 88 | spreads |
| 89 | salad dressing toppings |
| 90 | cocoa drink mixes |
| 91 | soy lactosefree |
| 92 | baby food formula |
| 93 | breakfast bakery |
| 94 | tea |
| 95 | canned meat seafood |
| 96 | lunch meat |
| 97 | baking supplies decor |
| 98 | juice nectars |
| 99 | canned fruit applesauce |
| 100 | missing |
| 101 | air fresheners candles |
| 102 | baby bath body care |
| 103 | ice cream toppings |
| 104 | spices seasonings |
| 105 | doughs gelatins bake mixes |
| 106 | hot dogs bacon sausage |
| 107 | chips pretzels |
| 108 | other creams cheeses |
| 109 | skin care |
| 110 | pickled goods olives |
| 111 | plates bowls cups flatware |
| 112 | bread |
| 113 | frozen juice |
| 114 | cleaning products |
| 115 | water seltzer sparkling water |
| 116 | frozen produce |
| 117 | nuts seeds dried fruit |
| 118 | first aid |
| 119 | frozen dessert |
| 120 | yogurt |
| 121 | cereal |
| 122 | meat counter |
| 123 | packaged vegetables fruits |
| 124 | spirits |
| 125 | trail mix snack mix |
| 126 | feminine care |
| 127 | body lotions soap |
| 128 | tortillas flat bread |
| 129 | frozen appetizers sides |
| 130 | hot cereal pancake mixes |
| 131 | dry pasta |
| 132 | beauty |
| 133 | muscles joints pain relief |
| 134 | specialty wines champagnes |
#> [1] 0
| department_id | department |
|---|---|
| 1 | frozen |
| 2 | other |
| 3 | bakery |
| 4 | produce |
| 5 | alcohol |
| 6 | international |
| 7 | beverages |
| 8 | pets |
| 9 | dry goods pasta |
| 10 | bulk |
| 11 | personal care |
| 12 | meat seafood |
| 13 | pantry |
| 14 | breakfast |
| 15 | canned goods |
| 16 | dairy eggs |
| 17 | household |
| 18 | babies |
| 19 | snacks |
| 20 | deli |
| 21 | missing |
#> [1] 0
| product_id | product_name | aisle_id | department_id |
|---|---|---|---|
| 1 | Chocolate Sandwich Cookies | 61 | 19 |
| 2 | All-Seasons Salt | 104 | 13 |
| 3 | Robust Golden Unsweetened Oolong Tea | 94 | 7 |
| 4 | Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce | 38 | 1 |
| 5 | Green Chile Anytime Sauce | 5 | 13 |
| 6 | Dry Nose Oil | 11 | 11 |
| 7 | Pure Coconut Water With Orange | 98 | 7 |
| 8 | Cut Russet Potatoes Steam N’ Mash | 116 | 1 |
| 9 | Light Strawberry Blueberry Yogurt | 120 | 16 |
| 10 | Sparkling Orange Juice & Prickly Pear Beverage | 115 | 7 |
| 11 | Peach Mango Juice | 31 | 7 |
| 12 | Chocolate Fudge Layer Cake | 119 | 1 |
| 13 | Saline Nasal Mist | 11 | 11 |
| 14 | Fresh Scent Dishwasher Cleaner | 74 | 17 |
| 15 | Overnight Diapers Size 6 | 56 | 18 |
| 16 | Mint Chocolate Flavored Syrup | 103 | 19 |
| 17 | Rendered Duck Fat | 35 | 12 |
| 18 | Pizza for One Suprema Frozen Pizza | 79 | 1 |
| 19 | Gluten Free Quinoa Three Cheese & Mushroom Blend | 63 | 9 |
| 20 | Pomegranate Cranberry & Aloe Vera Enrich Drink | 98 | 7 |
| 21 | Small & Medium Dental Dog Treats | 40 | 8 |
| 22 | Fresh Breath Oral Rinse Mild Mint | 20 | 11 |
| 23 | Organic Turkey Burgers | 49 | 12 |
| 24 | Tri-Vi-Sol® Vitamins A-C-and D Supplement Drops for Infants | 47 | 11 |
| 25 | Salted Caramel Lean Protein & Fiber Bar | 3 | 19 |
| 26 | Fancy Feast Trout Feast Flaked Wet Cat Food | 41 | 8 |
| 27 | Complete Spring Water Foaming Antibacterial Hand Wash | 127 | 11 |
| 28 | Wheat Chex Cereal | 121 | 14 |
| 29 | Fresh Cut Golden Sweet No Salt Added Whole Kernel Corn | 81 | 15 |
| 30 | Three Cheese Ziti, Marinara with Meatballs | 38 | 1 |
| 31 | White Pearl Onions | 123 | 4 |
| 32 | Nacho Cheese White Bean Chips | 107 | 19 |
| 33 | Organic Spaghetti Style Pasta | 131 | 9 |
| 34 | Peanut Butter Cereal | 121 | 14 |
| 35 | Italian Herb Porcini Mushrooms Chicken Sausage | 106 | 12 |
| 36 | Traditional Lasagna with Meat Sauce Savory Italian Recipes | 38 | 1 |
| 37 | Noodle Soup Mix With Chicken Broth | 69 | 15 |
| 38 | Ultra Antibacterial Dish Liquid | 100 | 21 |
| 39 | Daily Tangerine Citrus Flavored Beverage | 64 | 7 |
| 40 | Beef Hot Links Beef Smoked Sausage With Chile Peppers | 106 | 12 |
| 41 | Organic Sourdough Einkorn Crackers Rosemary | 78 | 19 |
| 42 | Biotin 1000 mcg | 47 | 11 |
| 43 | Organic Clementines | 123 | 4 |
| 44 | Sparkling Raspberry Seltzer | 115 | 7 |
| 45 | European Cucumber | 83 | 4 |
| 46 | Raisin Cinnamon Bagels 5 count | 58 | 1 |
| 47 | Onion Flavor Organic Roasted Seaweed Snack | 66 | 6 |
| 48 | School Glue, Washable, No Run | 87 | 17 |
| 49 | Vegetarian Grain Meat Sausages Italian - 4 CT | 14 | 20 |
| 50 | Pumpkin Muffin Mix | 105 | 13 |
#> [1] 0
| order_id | product_id | add_to_cart_order | reordered |
|---|---|---|---|
| 1 | 49302 | 1 | 1 |
| 1 | 11109 | 2 | 1 |
| 1 | 10246 | 3 | 0 |
| 1 | 49683 | 4 | 0 |
| 1 | 43633 | 5 | 1 |
| 1 | 13176 | 6 | 0 |
| 1 | 47209 | 7 | 0 |
| 1 | 22035 | 8 | 1 |
| 36 | 39612 | 1 | 0 |
| 36 | 19660 | 2 | 1 |
| 36 | 49235 | 3 | 0 |
| 36 | 43086 | 4 | 1 |
| 36 | 46620 | 5 | 1 |
| 36 | 34497 | 6 | 1 |
| 36 | 48679 | 7 | 1 |
| 36 | 46979 | 8 | 1 |
| 38 | 11913 | 1 | 0 |
| 38 | 18159 | 2 | 0 |
| 38 | 4461 | 3 | 0 |
| 38 | 21616 | 4 | 1 |
| 38 | 23622 | 5 | 0 |
| 38 | 32433 | 6 | 0 |
| 38 | 28842 | 7 | 0 |
| 38 | 42625 | 8 | 0 |
| 38 | 39693 | 9 | 0 |
| 96 | 20574 | 1 | 1 |
| 96 | 30391 | 2 | 0 |
| 96 | 40706 | 3 | 1 |
| 96 | 25610 | 4 | 0 |
| 96 | 27966 | 5 | 1 |
| 96 | 24489 | 6 | 1 |
| 96 | 39275 | 7 | 1 |
| 98 | 8859 | 1 | 1 |
| 98 | 19731 | 2 | 1 |
| 98 | 43654 | 3 | 1 |
| 98 | 13176 | 4 | 1 |
| 98 | 4357 | 5 | 1 |
| 98 | 37664 | 6 | 1 |
| 98 | 34065 | 7 | 1 |
| 98 | 35951 | 8 | 1 |
| 98 | 43560 | 9 | 1 |
| 98 | 9896 | 10 | 1 |
| 98 | 27509 | 11 | 1 |
| 98 | 15455 | 12 | 1 |
| 98 | 27966 | 13 | 1 |
| 98 | 47601 | 14 | 1 |
| 98 | 40396 | 15 | 1 |
| 98 | 35042 | 16 | 1 |
| 98 | 40986 | 17 | 1 |
| 98 | 1939 | 18 | 1 |
#> [1] 206209
#> [1] 206209
| order_id | order_dow | order_hour_of_day |
|---|---|---|
| 1187899 | 4 | 8 |
| 1492625 | 1 | 11 |
| 2196797 | 0 | 11 |
| 525192 | 2 | 11 |
| 880375 | 1 | 14 |
| 1094988 | 6 | 10 |
| 1822501 | 0 | 19 |
| 1827621 | 0 | 21 |
| 2316178 | 2 | 19 |
| 2180313 | 3 | 10 |
| 2461523 | 6 | 9 |
| 1854765 | 1 | 12 |
| 3402036 | 1 | 12 |
| 965160 | 0 | 16 |
| 2614670 | 5 | 14 |
| 3110252 | 4 | 11 |
| 62370 | 2 | 13 |
| 698604 | 4 | 13 |
| 1524161 | 0 | 13 |
| 3173750 | 0 | 9 |
| 2032076 | 0 | 20 |
| 2803975 | 0 | 11 |
| 1864787 | 5 | 11 |
| 2436259 | 0 | 12 |
| 1947848 | 4 | 20 |
| 2906490 | 4 | 22 |
| 2924697 | 5 | 18 |
| 519514 | 4 | 12 |
| 1750084 | 3 | 9 |
| 1647290 | 4 | 16 |
| 3088145 | 2 | 10 |
| 39325 | 2 | 18 |
| 13318 | 1 | 9 |
| 1651215 | 0 | 12 |
| 1019719 | 2 | 12 |
| 2989905 | 6 | 8 |
| 2639013 | 0 | 13 |
| 1072954 | 6 | 17 |
| 34647 | 3 | 19 |
| 2757217 | 0 | 11 |
| 669729 | 5 | 12 |
| 3038639 | 5 | 13 |
| 2608424 | 2 | 14 |
| 482516 | 4 | 7 |
| 3294399 | 4 | 8 |
| 1700658 | 6 | 11 |
| 21708 | 0 | 6 |
| 2178718 | 2 | 8 |
| 1734166 | 5 | 18 |
| 859654 | 1 | 10 |
We can observe on the left chart oder_dow that the most
frequent days of ordering are Sunday’s and Monday’s comparing to the
rest of the week, and on the right chart
order_hour_of_day,we note a high demand of orders between
9am to 6pm.
| order_id | order_dow | order_hour_of_day | aisle_id | aisle | department_id | department |
|---|---|---|---|---|---|---|
| 1187899 | 4 | 8 | 77 | soft drinks | 7 | beverages |
| 1187899 | 4 | 8 | 21 | packaged cheese | 16 | dairy eggs |
| 1187899 | 4 | 8 | 120 | yogurt | 16 | dairy eggs |
| 1187899 | 4 | 8 | 54 | paper goods | 17 | household |
| 1187899 | 4 | 8 | 45 | candy chocolate | 19 | snacks |
| 1187899 | 4 | 8 | 117 | nuts seeds dried fruit | 19 | snacks |
| 1187899 | 4 | 8 | 121 | cereal | 14 | breakfast |
| 1187899 | 4 | 8 | 23 | popcorn jerky | 19 | snacks |
| 1187899 | 4 | 8 | 84 | milk | 16 | dairy eggs |
| 1187899 | 4 | 8 | 53 | cream | 16 | dairy eggs |
| 1187899 | 4 | 8 | 77 | soft drinks | 7 | beverages |
| 1492625 | 1 | 11 | 96 | lunch meat | 20 | deli |
| 1492625 | 1 | 11 | 58 | frozen breads doughs | 1 | frozen |
| 1492625 | 1 | 11 | 107 | chips pretzels | 19 | snacks |
| 1492625 | 1 | 11 | 23 | popcorn jerky | 19 | snacks |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 24 | fresh fruits | 4 | produce |
| 1492625 | 1 | 11 | 91 | soy lactosefree | 16 | dairy eggs |
| 1492625 | 1 | 11 | 46 | mint gum | 19 | snacks |
| 1492625 | 1 | 11 | 96 | lunch meat | 20 | deli |
| 1492625 | 1 | 11 | 80 | deodorants | 11 | personal care |
| 1492625 | 1 | 11 | 1 | prepared soups salads | 20 | deli |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 38 | frozen meals | 1 | frozen |
| 1492625 | 1 | 11 | 69 | soup broth bouillon | 15 | canned goods |
| 1492625 | 1 | 11 | 37 | ice cream ice | 1 | frozen |
| 1492625 | 1 | 11 | 37 | ice cream ice | 1 | frozen |
| 1492625 | 1 | 11 | 37 | ice cream ice | 1 | frozen |
| 1492625 | 1 | 11 | 117 | nuts seeds dried fruit | 19 | snacks |
| 1492625 | 1 | 11 | 3 | energy granola bars | 19 | snacks |
| 1492625 | 1 | 11 | 69 | soup broth bouillon | 15 | canned goods |
| 1492625 | 1 | 11 | 69 | soup broth bouillon | 15 | canned goods |
| 2196797 | 0 | 11 | 29 | honeys syrups nectars | 13 | pantry |
| 2196797 | 0 | 11 | 24 | fresh fruits | 4 | produce |
| 2196797 | 0 | 11 | 21 | packaged cheese | 16 | dairy eggs |
| 2196797 | 0 | 11 | 66 | asian foods | 6 | international |
| 2196797 | 0 | 11 | 101 | air fresheners candles | 17 | household |
| 2196797 | 0 | 11 | 83 | fresh vegetables | 4 | produce |
| 2196797 | 0 | 11 | 66 | asian foods | 6 | international |
| 2196797 | 0 | 11 | 123 | packaged vegetables fruits | 4 | produce |
Top 10 number of purchase by aisle
| aisle | department | total_order |
|---|---|---|
| fresh vegetables | produce | 150609 |
| fresh fruits | produce | 150473 |
| packaged vegetables fruits | produce | 78493 |
| yogurt | dairy eggs | 55240 |
| packaged cheese | dairy eggs | 41699 |
| water seltzer sparkling water | beverages | 36617 |
| milk | dairy eggs | 32644 |
| chips pretzels | snacks | 31269 |
| soy lactosefree | dairy eggs | 26240 |
| bread | bakery | 23635 |
The number of purchase by department
| department | total_order |
|---|---|
| produce | 409087 |
| dairy eggs | 217051 |
| snacks | 118862 |
| beverages | 114046 |
| frozen | 100426 |
| pantry | 81242 |
| bakery | 48394 |
| canned goods | 46799 |
| deli | 44291 |
| dry goods pasta | 38713 |
Sales Patterns
Here, we would like to observe
the pattern of sales in depth by spiltting into departments. First, it
is the pattern of weekly sales.
From these graphs, we could observe the patterns as follow:
Although in the graph shown at the beginning illustrates that the peak of purchase usually is on Sunday and Monday, we can see alcohol is the exception here. For Alcohol, the figure increases slightly from the trough on Monday and reaches the top on Friday, then decreases sharply on Saturday.
The other departments have similar pattern. The figures decrease
from the top on Sunday, and then start increasing on Friday.
We analyze the association between the numbers of orders from different departments.
PCA by department PCA explains the similarity of variables. There are two metrics which are correlation(scaled) and covariance(not scaled). In our analysis, we focus on the relationship between the number of order from each department and day-of-week that users purchase. Thus, we will focus our PCA analysis on non-scale, i.e. using covariance. However, it would be interesting to see the differences of the results between scale and non-scaled PCAs as well, so we will also perform the PCA analysis with correlations.
Non-scaled PCA (Covariance) We observe that the first and second components explain 46.68% and 13.76% of variance of the data. Referring to the rule of thumb which selects the number of dimensions that allow to explain at least 75% of the variation, therefore comp 1 - comp 5 are selected and around 79.8% of variance of the data are explained.
Our finding: 1. Produce has the highest variation. Also, it is highly positively correlated with Dim1 and negatively correlated with Dim2 2. The other departments including the second to sixth largest variance variables(Dairy egg, Snacks, Frozen, Beverages and Pantry) are positively correlated with Dim1 and Dim2.
#> eigenvalue percentage of variance
#> comp 1 12.6419 46.6895
#> comp 2 3.7269 13.7642
#> comp 3 2.1298 7.8658
#> comp 4 1.5768 5.8236
#> comp 5 1.5351 5.6694
#> comp 6 1.1152 4.1186
#> comp 7 0.6467 2.3883
#> comp 8 0.6098 2.2523
#> comp 9 0.5149 1.9017
#> comp 10 0.4686 1.7308
#> comp 11 0.4194 1.5490
#> comp 12 0.3796 1.4018
#> comp 13 0.3128 1.1552
#> comp 14 0.2797 1.0329
#> comp 15 0.2621 0.9681
#> comp 16 0.1236 0.4563
#> comp 17 0.1212 0.4476
#> comp 18 0.1040 0.3840
#> comp 19 0.0833 0.3076
#> comp 20 0.0146 0.0538
#> comp 21 0.0107 0.0397
#> cumulative percentage of variance
#> comp 1 46.7
#> comp 2 60.5
#> comp 3 68.3
#> comp 4 74.1
#> comp 5 79.8
#> comp 6 83.9
#> comp 7 86.3
#> comp 8 88.6
#> comp 9 90.5
#> comp 10 92.2
#> comp 11 93.8
#> comp 12 95.2
#> comp 13 96.3
#> comp 14 97.3
#> comp 15 98.3
#> comp 16 98.8
#> comp 17 99.2
#> comp 18 99.6
#> comp 19 99.9
#> comp 20 100.0
#> comp 21 100.0
#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
#> canned goods 0.22151 1.20e-01 0.00149 0.122510 -0.000978
#> dairy eggs 0.89270 1.35e+00 -0.89979 -0.249501 0.003429
#> produce 3.38519 -5.46e-01 0.12652 -0.018243 0.027935
#> beverages 0.08314 5.02e-01 0.50253 -0.087091 1.104763
#> deli 0.16873 1.60e-01 0.03477 0.037581 -0.022161
#> frozen 0.26363 5.67e-01 0.19967 1.133036 -0.108314
#> pantry 0.27934 2.71e-01 0.02251 0.149302 -0.010888
#> snacks 0.27092 8.93e-01 1.00108 -0.410478 -0.539367
#> bakery 0.15404 2.03e-01 -0.00956 0.036576 -0.005598
#> household -0.01834 1.16e-01 0.06346 0.031781 0.091937
#> meat seafood 0.12813 6.36e-02 -0.01244 0.038850 -0.002224
#> personal care -0.00385 5.93e-02 0.03544 0.020033 0.034508
#> dry goods pasta 0.16510 1.59e-01 -0.00625 0.103859 -0.016606
#> babies 0.05536 6.88e-02 -0.02009 0.008534 -0.014092
#> missing 0.03241 2.57e-02 0.00731 0.008221 0.003323
#> other 0.00254 3.32e-03 0.00223 0.001340 0.001498
#> breakfast 0.07014 1.66e-01 0.04165 0.002389 -0.015789
#> international 0.05296 2.64e-02 0.00637 0.020592 -0.002705
#> alcohol -0.02207 1.25e-03 0.00601 -0.000461 0.005266
#> bulk 0.00743 7.95e-05 0.00184 -0.001773 -0.001129
#> pets -0.00531 1.53e-02 0.00650 0.008148 0.008623
Scaled PCA (Correlation)
We find that the first and second components can explain only 13.6% and 6.6% respectively, and we need 15 components (out of 21) to explain 75% of the variation. This means that correlations between departments are very low and we cannot use PCA to reduce the dimensions of the scaled data.
#> eigenvalue percentage of variance
#> comp 1 2.861 13.62
#> comp 2 1.382 6.58
#> comp 3 1.167 5.55
#> comp 4 1.049 4.99
#> comp 5 1.035 4.93
#> comp 6 1.008 4.80
#> comp 7 0.990 4.71
#> comp 8 0.972 4.63
#> comp 9 0.944 4.49
#> comp 10 0.931 4.43
#> comp 11 0.903 4.30
#> comp 12 0.874 4.16
#> comp 13 0.871 4.15
#> comp 14 0.839 4.00
#> comp 15 0.807 3.84
#> comp 16 0.791 3.77
#> comp 17 0.772 3.67
#> comp 18 0.760 3.62
#> comp 19 0.736 3.51
#> comp 20 0.714 3.40
#> comp 21 0.595 2.83
#> cumulative percentage of variance
#> comp 1 13.6
#> comp 2 20.2
#> comp 3 25.8
#> comp 4 30.8
#> comp 5 35.7
#> comp 6 40.5
#> comp 7 45.2
#> comp 8 49.8
#> comp 9 54.3
#> comp 10 58.8
#> comp 11 63.1
#> comp 12 67.2
#> comp 13 71.4
#> comp 14 75.4
#> comp 15 79.2
#> comp 16 83.0
#> comp 17 86.6
#> comp 18 90.3
#> comp 19 93.8
#> comp 20 97.2
#> comp 21 100.0